Confidence Measures for Error Correction in Interactive Transcription Handwritten Text

نویسندگان

  • Lionel Tarazón
  • Daniel Pérez
  • Nicolás Serrano
  • Vicente Alabau
  • Oriol Ramos Terrades
  • Alberto Sanchís
  • Alfons Juan-Císcar
چکیده

An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective balancing error and user effort in interactive handwriting recognition

Transcription of handwritten text documents is an expensive and timeconsuming task. Unfortunately, the accuracy of current state-of-the-art handwriting recognition systems cannot guarantee fully-automatic high quality transcriptions, so we need to revert to the computer assisted approach. Although this approach reduces the user effort needed to transcribe a given document, the transcription of ...

متن کامل

\textitTexT TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation

This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word sp...

متن کامل

The GIDOC Prototype

Transcription of handwritten text in (old) documents is an important, time-consuming task for digital libraries. In this paper, an efficient interactivepredictive transcription prototype called GIDOC (Gimp-based Interactive transcription of old text DOCuments) is presented. GIDOC is a first attempt to provide integrated support for interactive-predictive page layout analysis, text line detectio...

متن کامل

A Web-Based Demo to Interactive Multimodal Transcription of Historic Text Images

Paleography experts spend many hours transcribing historic documents, and state-of-the-art handwritten text recognition systems are not suitable for performing this task automatically. In this paper we present the modifications on a previously developed interactive framework for transcription of handwritten text. This system, rather than full automation, aimed at assisting the user with the rec...

متن کامل

Character-Based Handwritten Text Recognition of Multilingual Documents

An effective approach to transcribe handwritten text documents is to follow a sequential interactive approach. During the supervision phase, user corrections are incorporated into the system through an ongoing retraining process. In the case of multilingual documents with a high percentage of out-of-vocabulary (OOV) words, two principal issues arise. On the one hand, a minor yet important matte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009